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Abstract 


Language  extensions  of  Fortran  are  being  developed  which  permit  the  user  to  map 
data  structures  to  the  individual  processors  of  distributed  memory  machines.  These 
languages  allow  a  programming  style  in  which  global  data  references  are  used.  Current 
efforts  are  focussed  on  designing  a  common  basis  for  such  languages,  the  result  of  which 
is  known  as  High  Performance  Fortran  (HPF).  One  of  the  central  debates  in  the  HPF- 
efFort  revolves  around  the  concept  of  templates,  introduced  as  an  abstract  index  space, 
to  which  data  could  be  aligned.  In  this  paper,  we  present  a  model  for  the  mapping 
of  data  which  provides  the  functionality  of  High  Performance  Fortran  distributions 
without  the  use  of  templates.  j  ^  announced 
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STIC  QUALITY  INSPECTED  1 


1  Introduction 


Much  current  research  activity  is  concentrated  on  providing  suitable  programming  tools  for 
distributed-memory  architectures.  One  focus  is  on  the  provision  of  appropriate  high-level 
language  constructs  to  enable  users  to  design  programs  in  much  the  same  way  as  they  are 
accustomed  to  on  a  sequential  machine.  Several  proposals  have  been  put  forth  in  recent 
months  for  a  set  of  language  extensions  to  achieve  this  [3,  4,  5,  6,  10],  in  particular  (but  not 
only)  for  Fortran. 

Recently,  a  coalition  of  researchers  from  industry,  government  labs  and  academia  formed 
the  High  Performance  Fortran  Forum  to  develop  a  standard  set  of  extensions  for  Fortran 
90  which  would  provide  a  portable  interface  to  a  wide  variety  of  parallel  architectures.  The 
forum  has  produced  a  draft  proposal  for  a  language,  called  High  Performance  Fortran  (HPF), 
which  focuses  mainly  on  issues  of  distributing  data  across  the  memories  of  a  distributed 
memory  multiprocessor. 

High  Performance  Fortran  (HPF)  adds  directives  to  Fortran  90  to  allow  the  user  to 
advise  the  compiler  on  the  allocation  of  data  objects  to  processor  memories.  The  three  basic 
elements  of  the  model  are: 

•  abstract  processors, 

•  distributions,  which  are  mappings  of  objects  to  abstract  processors, 

•  alignments,  which  are  mappings  of  data  objects  to  other  objects. 

The  distribution  of  an  object  (usually  an  array)  specifies  a  mapping  of  the  index  domain 
associated  with  the  object  to  the  index  domain  of  a  set  of  abstract  processors.  This  may 
be  specified  by  the  user:  a)  directly,  by  explicitly  specifying  suitable  directives,  or  b) 
indirectly,  bj  using  an  alignment  that  relates  the  index  domain  of  the  array  to  the  index 
domain  of  another  object  whose  distribution  is  known. 

The  HPF  directives  provide  a  way  to  direct  the  compiler  to  ensure  that  certain  data 
objects  will  reside  in  the  same  processor.  The  underlying  motivation  is  that  an  operation 
on  two  or  more  data  objects  is  likely  to  be  carried  out  much  faster  if  they  all  reside  in  the 
same  processor,  and,  furthermore,  it  may  be  possible  to  carry  out  several  such  operations 
concurrently  if  they  can  be  performed  on  different  processors. 

Alignment  can  serve  as  a  bundling  mechanism:  once  many  arrays  are  aligned  to  the  same 
object,  then  they  can  be  distributed  onto  a  processor  arrangement  with  a  single  statement. 

In  general,  arrays  are  aligned  to  other  arrays.  However,  HPF  has  introduced  the  concept 
of  templates  to  be  used  as  an  alignment  base.  As  stated  in  the  HPF  language  specification  [8]: 


Sometimes  it  is  desirable  to  consider  a  large  index  space  with  which  several 
smaller  arrays  are  to  be  aligned,  but  not  to  declare  any  array  that  spans  the 
entire  index  space.  HPF  provides  the  notion  of  a  TEMPLATE ,  which  is  like 
an  array  whose  elements  have  no  content  and  therefore  occupy  no  storage;  it  is 
merely  an  abstract  index  space  that  can  be  distributed  and  with  which  arrays  may 
be  aligned. 

The  problem  with  this  approach  is  that  even  though  it  is  useful  in  some  special  situations, 
the  concept  of  templates  necessarily  complicates  the  whole  underlying  semantic  model.  Since 
templates  are  not  first  class  objects  in  the  language  (they  can  occur  only  in  directives),  they 
cannot  be  passed  across  procedure  boundaries,  and  thus  cannot  be  used  to  describe  the 
distributions  and  alignments  of  procedure  arguments.  Also,  as  currently  defined,  the  size 
of  templates  has  to  be  a  specification  expression  and  hence  templates  cannot  be  used  for 
describing  the  alignment  of  Fortran  90  allocatable  arrays. 

In  this  paper,  we  show  that  the  HPF  distribution  and  alignment  model  can  be  defined 
in  a  clear  and  concise  manner  without  templates,  while  retaining  the  intended  functionality. 

The  major  differences  between  the  current  HPF  draft  [8]  and  the  language  proposed  in 
this  paper  are  as  follows.  The  model  has  been  simplified  by: 

1.  Removing  template  directives. 

2.  Limiting  the  height  of  alignment  trees  to  1. 

3.  Clarifying  the  role  of  processors  by  establishing  a  language  defined  mapping  to  an 
implementation-specific  abstract  processors  arrangement. 

4.  Passing  of  arguments  to  procedures  has  been  simplified  by  eliminating  the  INHERIT 
attribute,  matching  alignments,  and  the  TO-clause  for  dummy  arguments. 

At  the  same  time,  the  language  has  been  significantly  generalized  with  the  objective  of 
improving  object  program  performance.  In  particular: 

1.  Arrays  may  be  distributed  to  processor  sections. 

2.  The  set  of  distribution  functions  has  been  extended  by  including  GENERAL-BLOCK. 
This  allows  the  specification  of  irregular  block  distributions,  which  are  important  for 
the  support  of  load  balancing,  and  can  be  implemented  efficiently  [13]. 

3.  The  concept  of  distribution  functions  has  been  defined  in  a  general  way  so  that  future 
language  standards  may  easily  incorporate  more  general  mappings. 


The  paper  is  organized  as  follows.  In  the  next  section  we  describe  the  model  and  ter¬ 
minology  underlying  our  proposal.  The  subsequent  sections  introduce  the  main  language 
extensions  -  processors,  distribution  directives  and  alignment  directives.  Issues  involving 
allocatable  arrays  and  procedures  are  treated  separately.  We  then  discuss  the  issues  arising 
due  to  HPF  templates  and  conclude  with  a  discussion  of  related  work. 

2  Model 

2.1  Index  Domains 

An  index  domain  I  of  rank  (dimension)  a  is  an  ordered  set  of  subscript  tuples  that,  can  be 
represented  by  a  subscript-triplet-list  of  length  n  (see  Fortran  90  specification,  R619).  Each 
element  of  an  index  domain  is  called  an  index;  it  represents  an  n-dimensional  arrangement 
of  values.  I  is  called  a  standard  index  domain  iff  the  stride  in  each  subscript  triplet  is  1. 

Let  A  denote  a  declared  data  array  (or  processor  array)  that  has  been  created.  Then  A 
is  associated  with  a  standard  index  domain  which  we  denote  by  IA. 

2.2  Distributions 

A  distribution  of  an  array  maps  each  array  element  to  one  or  more  processors  which  become 
the  owners  of  the  element  and,  in  this  capacity,  store  the  element  in  their  local  memory. 
We  model  distributions  by  mappings  between  the  associated  index  domains. 

Definition  1  Index  Mappings 

Let  I,  J  denote  two  index  domains.  An  index  mapping  from  I  to  J  is  a  total  function 
l  :  I  — ►  V  (J)  —  {<j)},  where  V  (3)  denotes  the  powerset  of  3  . 

Definition  2  Distributions 

Let  A  denote  an  array,  and  R  a  processor  array.  An  index  mapping  6^  from  1A  to  Is  is 
called  a  distribution  function  for  A  with  respect  to  R. 

A  distribution  function  8^  -  which  is  a  mapping  between  index  domains  -  induces  an 
associated  element-based  distribution  that  maps  elements  of  A  to  one  or  more  abstract 
processors.* 

Note  that  scalars  can  easily  be  accommodated  in  our  model  by  treating  them  as  if  they 
were  associated  with  an  index  domain  consisting  of  exactly  one  element. 

‘Note  that  replication  can  be  modeled  as  a  special  ase  of  distribution,  since  every  array  element  can  be 
distributed  to  an  arbitrary  (positive)  number  of  processors. 
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2.3  Alignment 

Definition  3  Let  A,B  denote  arbitrary  arrays.  An  index  mapping  ctg  from  IA  to  Is  is 
called  an  alignment  function  for  A  with  respect  to  B. 

Definition  4  Construction  of  a  distribution 

If  A ,  B ,  8B,  and  ajj  :  r1 2 * 4  — ♦  V  (Is)  —  {()>}  are  given  as  above,  then  8 £  can  be  determined  as 
follows:  For  each  i  £  \A  : 

We  will  express  this  relationship  below  in  the  form 

6&  =  CONSTRUCT^,  8%). 

This  can  be  verbally  described  as  follows:  if  i  is  an  index  of  A  which  is  mapped  to  an 
index  j  of  B  via  the  alignment  function  a,  then  A(i)  and  Z?(j)  are  guaranteed  to  reside  in 
the  same  processor  under  any  given  distribution  for  B. 

2.4  The  Alignment  Relation 

For  the  following  discussion,  we  consider  the  data  space  A  of  all  arrays  that  are  accessible 
in  a  given  scope,  and  have  been  created,  at  a  given  time  during  the  execution  of  a  program 
unit. 

An  alignment  directive  (see  Section  5)  establishes  an  alignment  from  an  array  A], 
the  alignee,  to  an  array  A-i ,  the  alignment  base.  It  defines  an  alignment  function  for  A\ 
with  respect  to  A 2. 

An  HPF  program  must  satisfy  the  following  constraints: 

1.  Each  array  occurring  as  an  alignment  base  must  not  be  aligned  to  another  array.  For 
such  an  array,  a  distribution  must  be  specified  directly. 

2.  Each  array  occurring  as  an  alignee  can  be  aligned  with  only  one  alignment  base. 

This  enables  us  to  represent  A  as  an  alignment  forest,  consisting  of  a  set  of  alignment 
trees.  The  nodes  in  the  alignment  forest  represent  arrays,  and  there  is  a  directed  edge  from 

B  to  A  if  and  only  if  A  is  aligned  to  B.  The  height  of  alignment  trees  may  be  either  1  or 
0.  An  alignment  tree  of  height  0  is  called  degenerate:  it  consists  of  exactly  one  node  that 
represents  an  array  which  is  not  aligned  to  any  other  array,  and  to  which  no  other  array  is 
aligned. 


Each  alignment  tree  T  has  a  uniquely  defined  root,  which  is  called  the  primary  array 
of  T.  All  other  nodes  of  T  are  called  secondary  arrays. 

Let  B  denote  a  primary  array.  Then  there  is  either  a  directive  which  explicitly  specifies 
a  distribution  for  B  or  B  is  implicitly  distributed  by  the  compiler.  Primary  arrays  are  the 
only  arrays  with  this  property. 

Let  A  denote  an  arbitrary  secondary  array  of  a  tree  with  primary  array  B  .  Then  there 
exists  an  alignment  function  a ,  describing  the  alignment  from  A  to  B.  If  8 f  is  the  distribution 
of  B ,  the  distribution  of  A  satisfies  8j |  =  CONST RUCT(a,8^). 

After  the  specification  part  of  a  unit  has  been  completely  processed,  the  alignment  forest 
can  be  constructed  for  the  set  of  all  arrays  that  are  accessible  and  have  already  been  created. 
This  is  the  initial  state  for  the  actual  alignment  forest  associated  with  the  processing  of  the 
executable  part  of  the  program.  The  structure  of  the  forest  may  change  dynamically  during 
execution  as  a  result  of  executing  REDISTRIBUTE  and  REALIGN  directives,  ALLOCATE 
and  DEALLOCATE  statements,  and  procedure  calls. 

For  the  details  of  these  manipulations  see  Sections  4.2,  5.2,  and  7.  Distribution  and 
alignment  functions  are  explained  in  Sections  4  and  5,  respectively. 

3  The  Processors  Directive 

Each  implementation  of  HPF  determines  uniquely  an  implicit  abstract  processor  ar¬ 
rangement,  AP,  which  specifies  a  linear  numbering  scheme  for  the  physical  processors  of 
the  underlying  machine. 

The  PROCESSORS  directive  declares  one  or  more  processor  arrangements,  each  of  which 
may  be  either  a  processor  array  arrangement  or  a  conceptually  scalar  processor  ar¬ 
rangement. 

The  specification  of  a  processor  arrangement  determines  the  name  and,  in  the  case  of  a 
processor  array  arrangement,  a  non-empty  index  domain.  It  must  appear  in  the  specification 
part  of  a  program  unit. 

Each  processor  arrangement  is  mapped  to  AP  in  the  same  way  as  storage  association  is 
defined  for  the  Fortran  90  EQUIVALENCE  statement,  with  abstract  processors  playing  the 
role  of  the  storage  units  (see  Fortran  90  specification,  5.5.1).  The  sharing  of  an  abstract 
processor  implies  the  sharing  of  the  associated  physical  processor. 

Depending  on  the  target  architecture,  data  distributed  to  a  (conceptually)  scalar  pro¬ 
cessor  arrangement  may  reside  in  a  single  control  processor  (if  the  machine  has  one),  or 
may  reside  in  an  arbitrarily  chosen  processor,  or  may  be  replicated  over  all  processors.  The 
language  does  not  specify  a  relationship  between  different  scalar  processor  arrangements. 


4  Distribution  Directives 

The  DISTRIBUTE  directive  specifies  the  distribution  (Section  2.2)  of  one  or  more  arrays, 
the  distributees,  by  establishing  for  each  distributee  a  mapping  between  its  index  domain 
and  the  index  domain  of  the  distribution  target,  which  is  either  a  processor  array  or  a 
section  thereof.  The  distribution  target  is  specified,  after  the  keyword  TO,  in  a  TO-clause. 
The  mapping  between  distributee  and  processor  array  can  be  specified  either  explicitly, 
as  a  distribution  format  list,  or  as  an  inherited  distribution.  The  elements  in  the 
distribution  format  list  are  associated  with  the  dimensions  of  the  distributee;  each  element 
is  one  of  the  following: 

1.  BLOCK 

2.  GENERAL_BLOCK(restricted-expression) 

3.  C Y CLIC[(specification-expression)] 

4. 

The  meaning  of  these  elements  will  be  discussed  below.  Inherited  distributions  will  be 
discussed  in  Section  7. 

Examples: 

!HPF$  DISTRIBUTE  A (BLOCK) 

!HPF$  DISTRIBUTE  B (CYCLIC)  TO  Q(1:N0P:2) 

!HPF$  DISTRIBUTE  C(GENERAL_BL0CK(S) ) 

!HPF$  DISTRIBUTE  (BLOCK,:)  ::  E,F 

4.1  Determining  an  Array  Distribution 

Let  A  denote  an  array  of  rank  n  which  is  not  a  dummy  argument,  and  assume  that  R  is 
the  associated  distribution  target  (explicitly  or  implicitly  specified).  The  distribution  of  A  is 
specified  by  a  list  of  distribution  formats.  The  length  of  this  list  must  be  n.  A  distribution 
format  specifies  that  the  corresponding  array  dimension  is  not  being  distributed.  The 
rank  of  R  must  be  n,  reduced  by  the  number  of  colons  in  the  distribution  format-list.  The  non¬ 
colon  entries  in  the  distribution  format  list  are  matched  from  left  to  right  to  the  dimensions 
of  R.  For  each  such  entry,  a  distribution  function  is  determined  according  to  the  rules  defined 
below.  Here  we  assume  both  the  array  and  the  processor  array  are  one-dimensional,  with 
index  domains  1^  =  [1  :  A']  and  1R  =  [/  :  AY*].  We  will  define  (In'  functions  associated 
with  the  distribution  formats  by  specifying  the  associated  distribut  ions,  which  will  be  simply 
denoted  by  8. 
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4.1.1  Block  Distributions 

The  block  distribution  function  is  specified  by  the  distribution  format  BLOCK;  it  divides 
the  array  into  contiguous  blocks  whose  sizes  are  identical,  except  possibly  for  the  last  block, 
which  may  be  of  a  smaller  size.  More  precisely,  let  q  :=  Then: 

•  6(i)  —  {j}  for  all  i,  1  <  t  <  jV,  where  j  =  {[^]}. 

•  The  local  index  associated  with  element  A(i)  in  processor  R(j )  is  i  —  (j  —  1)  *  q. 

4.1.2  General  Block  Distributions 

A  distribution  format  for  a  general  block  distribution  is  of  the  form  GENERAL.BLOCK(G), 
where  G  is  an  integer  array  with  index  domain  [1 : M] ,  where  M  >  NP  —  1. 

A  is  partitioned  into  NP  contiguous  blocks.  For  alii,  1  <  i  <  N  P,  G(i)  specifies  the  upper 
bound  of  block  i.  The  index  range  associated  with  block  1  is  [1  :  (7(1)];  for  1  <  i  <  NP, 
[G{i  —  1)  +  1  :  G(i)]  is  the  index  range  of  block  i;  and  [ G(M  —  1)  +  1  :  N]  is  the  index  range 
of  block  NP. 

4.1.3  Cyclic  Distributions 

Block-cyclic  distributions  are  specified  by  the  distribution  format  CYCLIC(fc),  with  an  ar¬ 
gument,  k  >  1,  of  type  integer.  CYCLIC(fc)  defines  contiguous  segments  of  length  k  and 
maps  them  cyclically  to  the  processors.  The  distribution  function  is  given  as  follows: 

S(i)  =  {MODULO(\i=±],NP  +  1)}  for  all  i,  1  <  i  <N 

Cyclic  distributions  are  specified  by  the  distribution  format  CYCLIC  .  This  is  equivalent 

to  CYCLIC(l). 

4.2  The  REDISTRIBUTE  Directive 

The  REDISTRIBUTE  directive  is  syntactically  similar  to  the  DISTRIBUTE  directive  but 
may  appear  only  in  the  execution  part  of  a  program  unit.  It  is  used  for  dynamically  changing 
the  distribution  of  an  array  and  may  only  be  used  for  arrays  that  have  been  declared  as 
DYNAMIC. 

If  an  array  B  is  redistributed,  then  every  array  A  that  is  aligned  to  B  is  redistributed  in 
such  a  way  that  the  relationship  expressed  by  the  alignment  function  linking  A  to  B  is  kept 
invariant  (see  Section  2.4).  If  B  is  a  secondary  array  at  the  time  of  redistribution,  then  the 
actual  alignment  forest  changes  as  follows:  B  is  disconnected  from  A  and  made  into  a  new 
degenerate  tree  with  primary  array  B. 
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5  Alignment  Directives 

The  ALIGN  directive  is  used  to  distribute  data  objects  indirectly,  by  specifying  one  or  more 
direct  alignment  relationships  and  the  associated  alignment  functions  (see  Sections  2 . ■'!  and 
2.4). 

Every  axis  of  the  alignee  is  specified  as  either  or  or  an  align-duimny ,  which  is  a 
scalar  integer  variable.  If  it  is  then  positions  along  that  axis  will  be  spread  out  across  the 
matching  axis  of  the  alignment  base;  if  it  is  then  that  axis  is  collapsed:  positions  along 
that  axis  make  no  difference  in  determining  the  corresponding  position  of  the  alignment  base. 
(Replacing  the  with  an  align-dummy  not  used  anywhere  else  in  the  directive  would  have 
the  same  effect;  thus  this  notation  is  a  convenience  only).  An  align-dummv  is  considered  to 
range  over  all  valid  index  values  for  that  dimension  of  the  alignee. 

Each  element  of  the  alignee  is  aligned  with  all  corresponding  positions  of  the  alignment 
base. 


5.1  Determining  the  Alignment  Function 

This  section  describes  howr  an  ALIGN  directive  specifies  the  alignment  function  associated 
with  the  direct  alignment  relationship  between  alignee  and  alignment  base.  Let 

•  A  denote  the  alignee,  and  I'4  =  [L i  :  T’i, . . . ,  Ln  :  Un] 

•  B  denote  the  alignment  base,  and  Is  =  [L\  :  U{, . . . ,  L'm  :  Lr'J 

The  alignment  function  mapping  T4  to  the  power  set  of  \B  will  be  denoted  by  a. 
Assume  that  the  directive  has  the  form 

ALIGN  A(s,,...,s„)  WITH  «w) 

where 

•  each  s,  is  or  an  aligv  Bimmy 

•  each  t-j  is  a  base-subscript.  This  can  be  any  of  the  following  cases: 

—  a  dummyless-expr,  i.e.,  a  scalar  integer  expression  in  which  no  align-dummy  occurs 

—  a  dummyuse-expr,  i.e.,  a  scalar  integer  expression  in  which  exactly  one  align- 
dummy  occurs 

—  a  subscript-triplet 
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We  explain  the  construction  of  a  by  first  applying  a  sequence  of  transformations  to  the 
directive  which  eliminate  and  in  the  alignee,  and  subscript-triplets  as  well  as  in 
the  base-subscri t  ‘  list.  The  transformations  are  specified  as  follows: 

•  As--  .ne  that  s,  matches  the  subscript  triplet  tj  =  [LT  :  UT  :  ST].  Then 

Ui  —  Li  +  1  <  M AX(I NT(UT  —  LT  +  ST)/ST,  0)  must  hold.  The  positions  in  axis  ? 
of  the  alignee  are  spread  out  across  axis  j  of  the  alignment  base: 

Si  is  replaced  by  a  new  align-dummy  J,  and  t}  is  replaced  by  the  expression 

(J  —  Li)  *  ST  +  LT.  (This  is  analogous  to  array  assignment). 

•  Assume  that  s,  =  Then  axis  i  of  the  alignee  is  collapsed: 

Si  is  replaced  by  a  new  align-dummy  J  which  occurs  nowhere  else. 

•  Assume  that  tj  =  This  denotes  replication: 

B(t{, . . . ,  f;_ j,  *,  t]+ 1, ....  tm)  is  replaced  by  the  set 
{B(tu...,tj-uk,tj+u...,tm)  \  L'j  <k  <  U-}. 

By  applying  these  transformations  until  neither  the  alignee  nor  the  alignment  base  contain 
positions  with  either  or  we  obtain: 

•  a  reduced  alignee  of  the  form  A(J\, . . . ,  Jn),  where  the  Jt  are  distinct  ahgn-dummies. 
The  range  of  J,  is  given  by  [L,  :  Ui]. 

•  an  alignment  base  set  ABS ,  every  element  of  which  has  tfe  form  B(yi, . . .  ,ym),  where 

each  yj  is  either  a  dummyless-expr  or  a  dummy-use-expr.  The  operators  and 

may  be  applied  to  form  expressions  which  are  linear  in  the  align-dummy.  Since 
linear  expressions  cannot  handle  some  frequently  occurring  cases,  such  as  truncation  at 
either  end  of  the  alignment,  we  also  allow  the  intrinsic  functions  MAX,  MIN,  LBOUND, 
UBOUND,  and  SIZE  to  be  used  in  alignment  functions.  Each  J,  may  occur  in  at  most 
one  y:  (this  excludes  the  possibility  to  specify  skew  alignments). 

The  basic  rules  for  determining  a  are  now  as  follows: 

1.  Select  an  arbitrary  tuple  j  =  {ju . . .  ,jn),  where  each  j.  is  a  value  in  the  range  of  Jt. 
and  substitute  j,  for  each  occurrence  of  J,  in  ABS. 

2.  Evaluate  all  expressions  in  the  modified  set  ABS ;  this  evaluation  is  performed  modulo 
the  extent  of  the  associated  dimension  of  the  alignment  base:  the  value  y  associated 
with  dimension  j  is  replaced  by  y  =  MIN(Uj,y). 
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Example: 


REAL  A(1:N) ,  D(1:N,1:M) 

! HPF$  ALIGN  A(:)  WITH  D(:,*) 

aligns  a  copy  of  A  with  every  column  of  D.  The  reduced  alignee  has  the  form  A(.l).  where  tin- 
range  of  J  is  [1  ;  N}.  For  the  alignment  base  set  we  obtain:  A  BS={  [)(.J.  k  )  |  1  <  A-  <  ,\/}. 
Hence,  o(.7)  =  {(J,  A-)  j  1  <  k  <  M  \  for  each  J  6  []  :  A']. 

Example: 

REAL  B(1:N,1:M),  E(1:N) 

! HPF$  ALIGN  B(:,*)  WITH  E(:) 

Ileie,  the  reduced  alignee  has  the  form  B{J\.J2 ),  where  the  range  of  ./]  is  [1  :  A]  and 
the  range  of  J2  is  [1  :  A/].  For  the  alignment  base  set  we  obtain:/!  BS={  IC(J\ ) }.  Thus. 
niJuJ'i)  —  { ( *^i ) }  for  each  J\  £  [1  :  Ar]  and  J2  €  (1  :  A/]. 

5.2  The  REALIGN  Directive 

The  REALIGN  directive  is  syntactically  similar  to  the  ALIGN  directive  but  may  appear  only 
in  the  execution-part  of  a  program  unit.  It  is  used  for  dynamically  changing  the  alignment 
of  an  array  and  again  may  only  be  used  for  arrays  that  have  been  declared  as  DYNAMIC. 

Assume  that  A  is  the  alignee,  B  the  base  array,  with  distribution  <S^,  and  a  the  alignment 
function  determined  by  the  REALIGN  directive.  Then  the  actual  alignment  forest  is  modified 
as  described  by  the  steps  below: 

1.  If  A  is  a  primary  array  at  the  root  of  a  non-degenerate  tree  immediately  before  ex¬ 
ecution  of  the  REALIGN  directive,  then  all  secondary  arrays  associated  with  A  are 
disconnected  from  A  and  made  into  primary  arrays  of  degenerate  trees  with  their 
current  distribution. 

If  A  is  a  secondary  array  with  associated  primary  array  f?',  then  A  is  disconnected 
from  B'.  (Note  that  B'  =  B  is  possible). 

2.  A  is  made  a  new  secondary  array  of  B. 

3.  The  distribution  of  A  is  determined  as  6^  —  CON  ST  RUCT{ct,6 
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6  Allocatable  Arrays 

Distribution  and  alignment  for  variables  with  the  ALLOCATABLE  attribute  may  be  speci¬ 
fied  using  DISTRIBUTE  or  ALIGN  directives.  These  directives  may  occur  in  the  specification- 
part  of  a  program  unit  just  as  for  ether  arrays:  the  associated  attributes  are  propagated 
to  each  associated  ALLOCATE  statement.  Such  variables  may  also  be  used  in  REDIS¬ 
TRIBUTE  and  REALIGN  directives. 

In  the  following  example,  distributions  are  specified  for  the  allocatable  arrays  A,  C  and 
D  which  are  valid  for  each  allocation  instance.  When  C  is  allocated  in  the  instance  shown, 
it  is  given  a  cyclic  distribution  in  the  executable  REDISTRIBUTE  directive.  At  the  time 
ALLOCATE  is  applied  to  an  array  B ,  the  array  is  created  according  to  the  alignment  given 
in  the  executable  REALIGN  statement.  The  actual  alignment  forest  is  modified  by  entering 
B  as  a  new  element  in  the  position  determined  by  the  alignment  relationships  involving  B. 
At  the  time  DEALLOCATE  is  applied  to  5,  the  array  is  removed  from  the  alignment  forest 
and  each  array  A  directly  aligned  to  B  is  made  into  a  new  tree  with  primary  A.  Note  that  a 
local  array  which  is  not  declared  ALLOCATABLE  cannot  be  aligned  in  the  specification-part 
of  a  program  unit  to  an  allocatable  array. 

Example: 

REAL , ALLOCATABLE (:,:)  ::  A,B 
REAL , ALLOCATABLE ( : )  ::  C,D 
!HPF$  PROCESSORS  PR (32) 

!HPF$  DISTRIBUTE  A (CYCLIC, BLOCK) 

!HPF$  DISTRIBUTE (BLOCK)  ::  C.D 
!HPF$  DYNAMIC  B.C 


READ  6 ,M,N 
ALLOCATE(A(N*M,N*M) ) 

ALLOCATE (B(N,N)) 

!HPF$  REALIGN  B( : , : )  WITH  A(M::M,1::M) 
ALLOCATE(C( 10000) ,  D(10000)) 
!HPF$  REDISTRIBUTE  C (CYCLIC)  TO  PR 
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7  Procedures 


The  distribution  of  dummy  arguments  can  be  specified  as  shown  below;  it  can  also  be 
specified  by  giving  an  alignment  to  another  dummy  argument  or  a  local  data  object  in  the 
usual  way.  Further,  a  local  data  object  may  be  aligned  to  a  dummy  argument. 

The  alignment  tree,  as  defined  in  Section  2.4,  is  local  to  a  procedure.  Thus,  an  array 
which  is  the  actual  argument  of  a  procedure  call  is  not  connected  with  iti  alignment  tree  in 
the  calling  unit  during  execution  of  the  called  procedure. 

If  a  dummy  argument  is  redistributed  or  realigned  during  execution  of  the  procedure, 
then  the  original  distribution  must  be  restored  on  procedure  exit. 

The  distribution  of  a  dummy  argument  A  can  be  specified  in  four  different  ways: 

1.  explicitly  by  providing  a  distribution  specification  of  the  form: 

DISTRIBUTE  A  d  [TO  r] 

where  cf  is  a  parenthesized  distribution  format-list,  and  ris  the  distribution  target.  Here, 
the  distribution  of  the  actual  argument  is  changed,  if  necessary,  to  the  distribution 
determined  by  the  specification  (see  Section  4.1).  If  necessary,  the  distribution  of  A 
before  the  call  has  to  be  restored  upon  exit  from  the  procedure. 

2.  by  inheritance,  syntactically  expressed  by: 

DISTRIBUTE  A  * 

In  this  case,  the  distribution  of  the  actual  argument  is  transferred  into  the  procedure 
and  inherited  by  A. 

3.  by  inheritance  matching,  syntactically  expressed  by: 

DISTRIBUTE  A  *  d  [TO  r] 

A  specification  of  this  form  indicates  that  the  distribution  of  the  actual  argument  is 
transferred  into  the  procedure  and  inherited  by  A.  However,  if  this  distribution  does 
not  match  the  above  specification,  then  the  program  is  not  HPF-conforming. 

If  this  distribution  attribute  of  the  dummy  is  known  within  the  calling  routine  (through 
the  use  of  interface  blocks,  for  example),  then  the  language  processor  will  arrange  for 
remapping  the  actual  argument  to  the  specified  distribution  (and  mapping  it  back  on 
return  from  the  subprogram,  if  necessary).  If  the  distribution  attribute  of  the  dummy 
is  not  made  available  when  the  caller  is  compiled,  the  onus  is  on  the  programmer  to 
arrange  for  proper  distribution  of  the  actual  argument. 
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4.  implicitly:  No  explicit  distribution  is  specified  (directly  or  indirectly).  In  this  case, 
the  compiler  provides  an  implicit  distribution  specification. 

8  The  Template  Directive  in  High  Performance  For¬ 
tran 

In  the  above  sections,  we  have  presented  a  model  for  mapping  of  data  to  processor  memories 
without  using  templates.  We  claim  that  the  HPF  template  directives  are  limited  in  their 
applicability  and  give  rise  to  serious  problems  in  the  specification  of  the  language,  without 
adding  any  significant  functionality. 

Template  directives,  which  may  occur  only  in  the  specification  part  of  a  (sub)program, 
result  in  the  creation  of  a  template.  Although  the  language  definition  states  that  “templates 
are  just  abstract  index  spaces”,  it  postulates  in  other  places  that  distinct  definitions  of 
templates  in  the  same  or  different  scopes  are  to  be  considered  as  different,  independent 
of  their  associated  index  domain.  As  a  consequence,  each  template  created  in  a  program 
execution  must  be  interpreted  as  a  tagged  index  domain. 

The  discussion  in  the  rest  of  this  section  does  not  include  the  so-called  “natural  templates” 
of  HPF:  they  represent  the  index  domain  associated  with  an  array  and  are  thus  implicitly 
part  of  our  proposal.  In  fact,  our  claim  could  be  rephrased  as  saying  that  “natural  templates” 
are  sufficient  to  describe  all  features  related  to  distribution  and  alignment. 

8.1  The  Usefulness  of  Templates 

Templates  have  been  perceived  to  have  two  separate  uses  within  the  language.  We  discuss 
each  of  these  briefly. 

8.1.1  Alignment  of  Staggered  Grids 

The  first  use  of  templates  is  to  enable  the  specification  of  alignment  between  arrays  where 
there  is  no  appropriate  common  index  domain:  this  can  occur  whenever  two  or  more  arrays 
are  each  associated  with  different  parts  of  a  physical  grid  which  do  not  completely  overlap. 

Before  we  discuss  the  general  case,  we  consider  the  example  posted  on  the  HPFF  Distri¬ 
bution  mailing  list  by  C.  A.  Thole: 
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REAL  U(0:N,1:N) ,  V(l:N,0:N),  P(1:N,1:N) 

!HPF$  TEMPLATE  T(0 : 2*N,0 : 2*N) 

!HPF$  ALIGN  P(I,J)  WITH  T(2*I-l,2*J-l) 

!HPF$  ALIGN  U(I,J)  WITH  T(2*I,2*J-1) 

!HPF$  ALIGN  V(I,J)  WITH  T(2*I-1,2*J) 

P»U(0:N-1, :)+U(l:N, :)+V(: ,0:N-1)+V(: ,1:N) 

For  the  above  code,  the  claim  was  made  that: 

1.  Only  a  template  with  a  larger  index  domain  than  any  of  the  arrays  involved  represents 
the  nature  of  the  physical  grid  structure  correctly. 

2.  Therefore  the  template  T  is  required  to  specify  the  relationship  between  the  data  objects 
precisely:  in  particular,  it  is  supposed  to  express  the  fact  that  P(I,J)  is  a  neighbor  of 
U(I,J)  and  U(I-1,J),  but  not  of  U(I+1,J),  and  similarly  for  P  and  V. 

3.  The  actual  distribution  of  the  template  (which  is  deliberately  omitted)  is  irrelevant  and 
will  be  chosen  in  a  machine-dependent  manner. 

Now,  note  that  whenever  two  data  objects  in  HPF  are  aligned  with  the  same  element 
of  a  template,  then  the  language  guarantees  that  these  objects  will  be  mapped  to  the  same 
physical  processor.  But  in  the  above  example,  all  arrays  are  aligned  with  disjoint  elements 
of  the  template.  As  a  consequence,  only  the  distribution  of  the  template  decides  the  actual, 
physical  neighborhood  relation.  For  example,  the  distribution 

!HPF$  DISTRIBUTE (CYCLIC, CYCLIC) : :T 

results  in  the  worst  possible  effect,  viz.  different  processor  allocations  for  any  two  neighbors. 

While  an  alignment  relation  between  arrays  in  a  program’s  data  space  is  a  relatively 
natural  concept,  the  template-based  code  above  does  not  establish  one.  Hence,  this  example 
is  misleading  at  best,  and  would  seem  to  point  out  a  danger  associated  with  the  template 
concept  rather  than  a  use  for  it. 

However,  the  user  will  certainly  desire  to  specify  a  collocation  of  the  arrays  in  the  above 
code  or  similar  codes,  which  can  be  accomplished  by  declaring  a  template  of  size  (N+l,N-f  1). 
It  is  indeed  not  possible  to  correctly  specify  an  HPF  alignment  (without  a  template)  in  this 
situation.  Our  extension  of  the  HPF  alignment  directive  (which  allows  restricted  usage  of 
MAX  and  MIN),  will  suffice  to  permit  explicit  alignment  directives  for  many  cases  which 
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occur  in  practice,  including  this  one.  Otherwise,  the  distributions  must  be  specified  explic¬ 
itly.  Given  a  suitable  definition  of  the  block  distribution,  one  way  to  perform  the  required 
distributions  is  the  following:* 

REAL  U(0:N,1:N),  V(1:N,0:N),  P(1:N,1:N) 

!HPF$  DISTRIBUTE  (BLOCK, BLOCK) : :  U.V.P 

P=U(0:N-1, :)+U(l:N, :)+V(: ,0:N-1)+V(: ,1:N) 

The  language  proposal  contained  in  this  paper  offers  a  much  more  general  solution,  by 
providing  a  generalized  form  of  block  distribution. 

8.1.2  Passing  Array  Sections  to  Subroutines 

The  second  perceived  use  for  a  template  directive  was  to  permit  the  explicit  declaration  of 
mappings  of  array  sections  in  subroutines: 

REAL  A(1000) 

!HPF$  DISTRIBUTE  A  (CYCLIC (3)) 

CALL  SUB (A (2: 996: 2)) 

SUBROUTINE  SUB(X) 

REAL  X(:)  !X  inherits  its  distribution 

We  assume  that  the  dummy  argument  in  subroutine  SUB  inherits  its  distribution  from 
the  actual  argument. 

The  question  raised  here  is: 

how  can  the  mapping  of  X  be  declared  in  SUB  if  one  wants  to  specify  it  explicitly? 

Now  one  will,  in  general,  not  want  to  explicitly  specify  such  a  distribution:  the  relatively 
high  cost  associated  with  data  movement  on  the  current  generation  of  parallel  computers 
means  that  a  subroutine  will  usually  be  written  so  that  it  is  invoked  with  distributed  ar¬ 
guments  and  the  dummy  arguments  will  indeed  inherit  the  distribution  from  the  actual 
argument  as  above.  However,  just  as  we  write  one  subroutine  to  handle  arrays  of  differ¬ 
ent  sizes,  so  one  expects  such  a  subroutine  to  accept  arrays  with  different  distributions.  In 

Ulere  the  Vienna  Fortran  definition  of  BLOCK  is  assumed.  With  the  UPF  definition,  this  will  cause  a 
problem  if  and  only  if  the  number  of  processors  divides  N  exactly. 
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those  cases  where  a  subroutine  is  important  enough  to  warrant  a  specific  redistribution  of 
its  arguments,  or  if  this  should  be  necessary  for  some  reason,  then  the  language  provides  the 
constructs  required  to  prescribe  the  mappings. 

Templates  were  seen  as  a  solution  to  the  problem  of  providing  distributions  such  as  that 
of  X  above  explicitly,  should  it  be  deemed  necessary: 

SUBROUTINE  SUB(X) 

!HPF$  TEMPLATE  T(1000) 

!HPF$  ALIGN  X(I)  WITH  T(2*I) 

!HPF$  DISTRIBUTE  T  (CYCLIC (3)) 

The  template  does  help  to  specify  this  distribution  in  the  example,  but  at  the  above- 
mentioned  cost  of  a  loss  of  generality  for  the  entire  subroutine.  Note,  further,  that  the  same 
effect  can  be  achieved  by  passing  the  entire  array  A  to  the  subroutine  and  either  using  the 
array  section  explicitly  or,  if  it  is  passed  as  a  separate  argument,  repeating  the  alignment  of 
the  argument  as  above: 

SUBROUTINE  SUB(A,X) 

!HPF$  REAL  A (1000) 

!HPF$  ALIGN  X(I)  WITH  A (2*1) 

! HPF$  DISTRIBUTE  A  * (CYCLIC (3)) 

(The  asterisk  indicates  that  the  distribution  of  A  is  inherited).  But  recall  that  if  there  is 
another  call  site  for  this  subroutine  with  a  different  actual  argument  for  X,  then  neither  of 
these  solutions  will  be  of  any  use.  Instead,  inquiry  functions  must  be  used  to  determine  the 
properties  of  alignments  and/or  distributions  passed  into  the  subroutine. 

The  current  definition  of  HPF  further  attempts  to  facilitate  the  manipulation  of  the  dis¬ 
tributions  of  sections  of  arrays  passed  to  subroutines  by  introducing  the  INHERIT  directive, 
which  further  removes  the  need  for  explicit  use  of  templates  in  this  situation  (albeit  at  the 
cost  of  introducing  a  host  of  new  syntactic  and  semantic  difficulties). 

The  main  reason  for  this  problem  is  that  the  current  HPF  language  specification  has  an 
unfortunate  shortcoming:  HPF  cannot  (in  contrast  to,  for  example,  Kali  or  Vienna  Fortran, 
which  include  the  concept  of  user-defined  distribution  functions),  describe  explicitly  every 
distribution  that  it  can  actually  generate. 

8.2  Language  Problems  with  Templates 

We  now  reiterate  the  two  major  problems  caused  by  templates  in  the  HPF  language  defi¬ 
nition.  Note  that  templates  are  not  first-class  objects  of  the  language:  in  particular,  tem- 
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plates  cannot  be  defined  as  being  ALLOCATABLE.  Furthermore,  they  cannot  be  passed 
as  arguments  to  subroutines. 

1 .  Templates  cannot  handle  allocatable  arrays: 

While  the  shape  of  templates  is  determined  at  entry  to  a  program  unit  and  cannot  be 
changed  afterwards,  an  allocatable  array  may  be  subject  to  multiple  ALLOCATE  and 
DEALLOCATE  statements,  where  the  extents  of  the  dimensions  associated  with  each 
instance  may  depend  on  run-time  and  input  values.  There  is  no  way  in  which  HPF 
can  establish  a  direct  relationship  between  the  shape  of  an  instance  of  an  allocatable 
array,  and  the  shape  of  an  associated  template. 

Methods  to  avoid  this  dilemma  would  include  the  definition  of  allocatable  templates, 
or  of  infinite  templates  (neither  of  which  are  a  serious  alternative). 

2.  Templates  cannot  be  passed  across  procedure  boundaries: 

A  data  object  whose  distribution  is  described  by  a  template  may  be  passed  to  a  sub¬ 
program  in  such  a  way  that  the  dummy  inherits  the  distribution.  If  we  need  to  describe 
the  distribution  of  the  dummy  argument,  then  we  must  be  able  to  refer  to  the  template 
of  the  actual  (see  above  example).  In  HPF  this  would  require  the  passing  of  templates 
to  the  subprogram  as  well.  The  INHERIT  option  for  dummy  arguments  in  the  cur¬ 
rent  HPF  definition  tries  to  achieve  exactly  that,  introducing  an  element  of  maximum 
surprise  for  the  user.  The  above  example  could  be  written  as  follows: 

REAL  A(1000) 

!  HPF$  DISTRIBUTE  A  (CYCLIC (3)) 

CALL  SUB(A(2 : 996 : 2) ) 

SUBROUTINE  SUB(X) 

REAL  X( : ) 

!HPF$  INHERIT: :X 

!HPF$  DISTRIBUTE  X  * (CYCLIC (3)) 

The  idea  here  is  that  the  distribution  specified  for  X  is  not  the  distribution  of  the 
dummy  argument,  i.e.,  the  distribution  of  the  array  section  A(2:996:2),  but  that  of  the 
array  associated  with  the  actual  argument. 
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In  contrast,  the  distributions  defined  in  the  language  proposal  of  this  paper  (as  well  as 
in  Vienna  Fortran)  are  considered  to  be  an  attribute  of  an  array,  and  they  are  handled 
that  way  as  well.  Even  in  the  case  of  inherited  distributions  which  cannot  be  explicitly 
specified,  inquiry  functions  can  be  used  to  determine  every  aspect  of  the  distribution 
passed  into  the  procedure. 

9  Related  Work 

Many  of  the  concepts  and  constructs  used  in  the  above  language  proposal,  and  in  the  HPF 
specification,  are  not  new.  Processor  arrays  and  the  distribution  of  data  to  them  were 
first  used  for  distributed  memory  machines  in  the  Kali  programming  language  [9].  They 
were  further  refined  in  the  Vienna  Fortran  language,  where  processor  arrays  could  also  Le 
reshaped,  now  expressed  by  means  of  the  HPF  VIEW  attribute.  A  major  difference  in  the 
handling  of  processor  arrays  is,  however,  that  Vienna  Fortran  supports  the  mapping  of  data 
to  subsets  of  processor  arrays  and  provides  a  canonical  mapping  of  processor  arrays  to  a 
linear  processor  array,  to  facilitate  the  portability  of  code. 

The  Vienna  Fortran  language  [1,  3,  12]  is  based  both  upon  Kali  and  upon  experience 
gained  with  the  SUPERB  parallelization  system  ([7,  11,  13]);  it  provides  the  user  with  a 
wide  range  of  facilities  for  mapping  data  structures  to  processors,  including  those  proposed 
in  this  paper  and  user-defined  distributions.  Vienna  Fortran  was  the  first  language  in  which 
the  issues  of  distribution  handling  at  subroutine  boundaries  were  investigated  in  depth.  It 
introduced  the  concept  of  inheriting  and  of  enforcing  distributions  and  provided  an  attribute 
to  enable  the  user  to  make  assertions  about  the  distributions  of  actual  arguments.  This 
language  was  also  the  first  to  make  the  distinction  between  static  and  dynamic  distributions. 

Among  other  things,  the  mapping  of  data  to  subsets  of  processors  and  the  inheritance 
of  distributions  have  been  implemented  within  the  framework  of  the  Vienna  Fortran  Com¬ 
pilation  System.  Two  variants  of  the  general  block  distribution  used  in  this  paper,  but  not 
included  in  HPF,  have  also  been  implemented. 

The  programming  language  Fortran  D  [6]  proposes  a  Fortran  language  extension  in  which 
the  programmer  specifies  the  distribution  of  data  by  aligning  each  array  to  a  decomposition, 
which  corresponds  to  a  template,  and  then  specifying  a  distribution  of  the  decomposition 
to  a  virtual  machine.  These  are  executable  statements,  and  array  distributions  are  dynamic 
only. 

The  Yale  Extensions  [4]  specify  the  distribution  of  arrays  in  three  stages:  alignment, 
partition  and  a  physical  map.  Because  all  these  stages  are  modeled  as  bijective  functions 
between  index  domains,  data  replication  is  not  possible.  By  restricting  scope  of  layout 
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directives  to  phases,  a  block  structure  is  imposed  on  Fortran  90. 

Cray  Research  Inc.  has  announced  a  set  of  language  extensions  to  Cray  Fortran  (cf77)  [10] 
which  enable  the  user  to  specify  the  distribution  of  data  and  work.  They  provide  intrinsics  for 
data  distribution  and  permit  redistribution  at  subroutine  boundaries.  Further,  they  permit 
the  user  to  structure  the  executing  processors  by  giving  them  a  shape  and  weighting  the 
dimensions.  Several  methods  for  distributing  iterations  of  loops  are  provided. 

10  Conclusions 

An  approach  which  substantially  reduces  the  cost  of  developing  codes  for  distributed  memory 
parallel  machines  is  to  provide  a  set  of  extensions  for  sequential  languages  (in  particular, 
Fortran  and  C).  These  extensions  should  be  portable  across  a  wide  range  of  architectures 
and  should  suffice  for  a  wide  variety  of  algorithms.  The  methods  by  which  the  user  may 
distribute  data  to  the  processors  are  the  central  feature  of  such  a  language,  and  should  be  as 
natural  and  as  flexible  as  possible.  In  this  paper,  we  have  presented  in  detail  such  a  model 
for  distribution  and  alignment  of  data.  This  model  is  both  simpler  and  more  general  than 
*he  current  High  Performance  Fortran  model.  In  particular,  it  does  not  require  a  template 
directive  and  has  simplified  the  passing  of  distributed  arguments  to  subroutines.  On  the 
other  hand,  the  concept  of  distribution  functions  has  been  generalized.  A  full  description  of 
the  model  described  in  this  paper  can  be  found  in  [2]. 
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